1. Correct Spectral Baseline

Why correct baseline?

  • In spectroscopy, we need to be able to accurately interpret spectral data.

  • A spectral baseline represents the level of signal where the sample absorbs no light, that ideally would be a straight line at zero.

  • In reality, there is baseline noise caused by inherent noise in the instrument or sample scattering.

  • Identifying and correcting a spectral baseline minimises the baseline noise, which is essential for quantitative spectroscopy - when the height and area of spectral peaks are relevant to analysis. Without a flattened baseline, these values can be greatly under- or over-estimated.

  • Most importantly, accurate qualitative spectroscopic analysis is crucial for comparison with other data sets!

read this short blog and write a few sentences about the Why correction - I'll edit it later - https://www.tutorchase.com/answers/ib/chemistry/why-are-baseline-corrections-necessary-in-spectroscopy

Key Objectives in this tutorial:

  1. Impliment the following algorithims to remove the baseline

  2. Compare the corrected baseline togather to choose the top method

  3. Visulize the results

Table of Contents

[1]:
# Import necessary modules
from Xpectra.SpecFitAnalyzer import *
from Xpectra.SpecStatVisualizer import plot_spectra_errorbar_bokeh, plot_compare_baselines

2.1 Load and preprocess the \(\mathrm{CH_{4}}\) lab spectrum

The laboratory spectrum has two columns: 1. Wavenumber [\(cm^{-1}\)] 2. Signal (arbitrary unit)

[2]:
# Call environment variable and assign path to data
__reference_data_path__ = os.getenv("Xpectra_reference_data")

# Import methane spectrum
methane_spectrum = pd.read_csv(os.path.join(__reference_data_path__, 'datasets','Spectrum_CH4_100Torr.csv'))

x = 10**7/methane_spectrum['W'].to_numpy() # Wavenumber
y = methane_spectrum['I'].to_numpy() # Intensity

\(\rightarrow\) First, we instantiate the class with our wavenumber and signal arrays, as well as our reference_data path.

[3]:
# Initialize SpecFitAnalyzer
specfit = SpecFitAnalyzer(wavenumber_values= x,
                          signal_values = y,
                          absorber_name = 'CH4',
                          __reference_data__ = __reference_data_path__)
[4]:
# Check for NAN or negative values and update x and y with trimmed arrays
specfit.check_negative_nan()
No NAN values.
9614 negative values found (7.78% of data)
[5]:
# Convert to absorption

y = np.exp(-y)
[6]:
# Update instance
specfit.signal_values = y

2.2 Visualize spectrum

Emily, please use the func from notebook 1 here, we only need the interactive bokeh one

\(\rightarrow\) Plot spectrum interactively using Bokeh

[7]:
plot_spectra_errorbar_bokeh(wavenumber_values = x,
                            signal_values = y,
                            absorber_name = 'CH4',
                            plot_type = 'line')
Loading BokehJS ...

2.3 Correct Spectral Baseline

Please talk about the SpecFitAnalyzer and the functions - say the steps and what do you want to acheive here

Xpectra.SpecFitAnalyzer module has 3 essential purposes:

  1. Process spectral data ✅

  2. :math:`rightarrow` Fit and correct spectral baseline :math:`leftarrow`

  3. Identify and fit spectral peaks

\(\rightarrow\) At this step, we use functions to model the shape of the baseline. Once we derive the fitted baseline, we can subtract it from the signal to create a baseline-corrected signal.

  • 2.3.1 ARPLS: Asymmetrically reweighted penalized least squares smoothing baseline correction algorithm

  • 2.3.2 ALS: Asymmetric least squares smoothing baseline correction algorithm

2.3.1 ARPLS Method

\(\rightarrow\) Run ARPLS baseline correction, visualize the plot, and save the plot as a pdf

[8]:
# sam: fitting metrics with __print__
[9]:
# for emily, this plot needs some improvement:
# residual need to be 1/2 height of the main top plot
# add save option
# add print option for metrics and other fitting parameters
# We also need to save the corrected_baseline_spectra_ALPLS into the class by something like:
#  "self.corrected_baseline_spectra_ALPLS = corrected_baseline_spectra_ALPLS"
[10]:
# Fit baseline using ARPLS algorythm
specfit.arpls(__plot__ = True,
              __save_plots__ = True,
              __print__ = True)
/Users/familymader/Xpectra_project/Xpectra/Xpectra/SpecFitAnalyzer.py:593: SparseEfficiencyWarning: splu converted its input to CSC format
  lu = splu(WH)  # Use sparse LU decomposition
/Users/familymader/Xpectra_project/Xpectra/Xpectra/SpecFitAnalyzer.py:599: RuntimeWarning: overflow encountered in exp
  wt = 1. / (1 + np.exp(2 * (d - (2 * s - m)) / s))
../../../_images/_build2_html_tutorials_2_Correct_Spectral_Baseline_26_1.png
Fitting parameters...
Metrics...
[11]:
specfit.baseline_type
[11]:
'arpls'

2.3.2 ALS Method

\(\rightarrow\) Do the same for ALS: Run ALS baseline correction, visualize the plot, and save the plot as a pdf

[12]:
# Fit baseline using ALS algorythm
specfit.als(__plot__ = True,
              __save_plots__ = True,
              __print__ = True)
../../../_images/_build2_html_tutorials_2_Correct_Spectral_Baseline_30_0.png
Fitting parameters...
Metrics...
[13]:
specfit.baseline_type
[13]:
'als'

2.3.3 Compare The Corrected Baselines

\(\rightarrow\) As a part of the fitting process, we determine which method models the baseline most effectively.

\(\rightarrow\) Let’s perform qualitative analysis on the baseline-correction methods by overplotting the residuals from ARPLS and ALS methods.

[14]:
plot_compare_baselines(wavenumber_values = x,
                       corrected_signal_1 = specfit.y_baseline_corrected_ARPLS,
                       baseline_type_1 = 'ARPLS',
                       corrected_signal_2 = specfit.y_baseline_corrected_ALS,
                       baseline_type_2 = 'ALS'
                      )
Loading BokehJS ...

\(\rightarrow\) In this case, ARPLS has a cleaner zero point after baseline subtraction, so we choose this as the baseline-correction method. Let’s update the class with our choice:

[15]:
# Define the best baseline corrected output
specfit.y_baseline_corrected = specfit.y_baseline_corrected_ARPLS

# Update chosen baseline type
specfit.baseline_type = 'ARPLS'

2.4 Save the baseline corrected spectra

save the df to csv and name the columns - orininal data 2 cols - processed data 2 cols - Baseline corrected 2 cols

\(\rightarrow\) Create DataFrame with original data, processed data, and baseline-corrected data

[16]:
# Create DataFrame
data = {
    'original_x': x,
    'original_y': y,
    'cleaned_x': specfit.x_cleaned,
    'cleaned_y': specfit.y_cleaned,
    'baseline_corrected_x': x,
    'baseline_corrected_y': specfit.y_baseline_corrected,
}

df = pd.DataFrame.from_dict(data, orient='index').transpose()
[17]:
display(df)
original_x original_y cleaned_x cleaned_y baseline_corrected_x baseline_corrected_y
0 2898.543060 0.643845 2898.543060 0.440297 2898.543060 0.014491
1 2898.543908 0.646336 2898.543908 0.436436 2898.543908 0.017204
2 2898.544766 0.645778 2898.544766 0.437300 2898.544766 0.016869
3 2898.545133 0.639101 2898.545133 0.447693 2898.545133 0.010415
4 2898.545638 0.630384 2898.545638 0.461426 2898.545638 0.001921
... ... ... ... ... ... ...
123522 2985.057227 0.635260 NaN NaN 2985.057227 -0.000147
123523 2985.058132 0.635798 NaN NaN 2985.058132 0.000649
123524 2985.058876 0.633810 NaN NaN 2985.058876 -0.001080
123525 2985.059515 0.632179 NaN NaN 2985.059515 -0.002451
123526 2985.060049 0.633246 NaN NaN 2985.060049 -0.001124

123527 rows × 6 columns

\(\rightarrow\) Save the spectra to CSV file

[18]:
# Define file name
file_name = f"{specfit.baseline_type.lower()}_baseline_corrected_methane_spectrum.csv"

# Save DataFrame to CSV
df.to_csv(os.path.join(__reference_data_path__,'processed_data',file_name), index=False)
[ ]:

[ ]: